28 research outputs found

    Curriculum Guidelines for Undergraduate Programs in Data Science

    Get PDF
    The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science

    Curriculum Guidelines for Undergraduate Programs in Data Science

    Get PDF
    The Park City Math Institute 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in data science. The group consisted of 25 undergraduate faculty from a variety of institutions in the United States, primarily from the disciplines of mathematics, statistics, and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in data science

    A High Throughput Genetic Screen Identifies New Early Meiotic Recombination Functions in Arabidopsis thaliana

    Get PDF
    Meiotic recombination is initiated by the formation of numerous DNA double-strand breaks (DSBs) catalysed by the widely conserved Spo11 protein. In Saccharomyces cerevisiae, Spo11 requires nine other proteins for meiotic DSB formation; however, unlike Spo11, few of these are conserved across kingdoms. In order to investigate this recombination step in higher eukaryotes, we took advantage of a high-throughput meiotic mutant screen carried out in the model plant Arabidopsis thaliana. A collection of 55,000 mutant lines was screened, and spo11-like mutations, characterised by a drastic decrease in chiasma formation at metaphase I associated with an absence of synapsis at prophase, were selected. This screen led to the identification of two populations of mutants classified according to their recombination defects: mutants that repair meiotic DSBs using the sister chromatid such as Atdmc1 or mutants that are unable to make DSBs like Atspo11-1. We found that in Arabidopsis thaliana at least four proteins are necessary for driving meiotic DSB repair via the homologous chromosomes. These include the previously characterised DMC1 and the Hop1-related ASY1 proteins, but also the meiotic specific cyclin SDS as well as the Hop2 Arabidopsis homologue AHP2. Analysing the mutants defective in DSB formation, we identified the previously characterised AtSPO11-1, AtSPO11-2, and AtPRD1 as well as two new genes, AtPRD2 and AtPRD3. Our data thus increase the number of proteins necessary for DSB formation in Arabidopsis thaliana to five. Unlike SPO11 and (to a minor extent) PRD1, these two new proteins are poorly conserved among species, suggesting that the DSB formation mechanism, but not its regulation, is conserved among eukaryotes

    A Guided Tour of Modern Regression Methods

    No full text
    The statistical practitioner today, who wants to find new methods to fit historical data is confronted by a often bewildering morass of acronyms. We will attempt, via a few examples, to shed some light on how techniques such as CART, MARS, GAM, PLS, PCR and ANN work and how they can be used effectively. This paper is based on an invited tutorial on modern regression methods given at the 1995 Fall Technical Conference in St. Louis. KEYWORDS: nonparametric regression; function approximation; neural networks; generalized additive models; tree based regression. 1 Introduction Our aim in this paper is to provide an introduction to several of the more popular regression based techniques currently used by data analysts. Our intent is to familiarize the reader with each technique, not to provide an in-depth analysis of each. We will illustrate the techniques via examples, referring the reader to the vast bibliography on the subject for more details on the estimation and inference properties of..

    Multicollinearity: A tale of two nonparametric regressions

    No full text
    The most popular form of arti#cial neural network, feedforward networks with sigmoidal activation functions, and a new statistical technique, multivariate adaptive regression splines #MARS# can both be classi#ed as nonlinear, nonparametric function estimation techniques, and both show great promise for #tting general nonlinear multivariate functions. In comparing the two methods on a variety of test problems, we #nd that MARS is in many cases both more accurate and much faster than neural networks. In addition, MARS is interpretable due to the choice of basic functions which makeup the #nal predictive equation. This suggests that MARS could be used on manyof the applications where neural networks are currently being used. However, MARS exhibits problems in choosing among predictor variables when multicollinearity is present. Due to their redundant architecture, neural networks, however, do not share this problem, and are better able to predict in this situation. To improve..

    Estimating Prediction Intervals for Artificial Neural Networks

    No full text
    Neural networks can be viewed as nonlinear models, where the weights are parameters to be estimated. In general two parameter estimation methods are used: nonlinear regression, corresponding to the standard backpropagation algorithm, and Bayesian estimation, in which the model parameters are considered as being random variables drawn from a prior distribution, which is updated based on the observed data. These two estimation methods suggest different methods of calculating prediction intervals for neural networks. We present some preliminary observations comparing the ability of the two methods to provide accurate prediction intervals. 1 Introduction Artificial neural networks (ANN) are being used with increasing frequency as an alternative to traditional models for a range of applications including modelbased control. Unfortunately, ANN models rarely provide any indication of the accuracy or reliability of their predictions. (One exception is for radial basis functions; See Leonard ..
    corecore